AITopics

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)

Risi Kondor, Zhen Lin, Shubhendu Trivedi

Clebsch–Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network

Neural Information Processing SystemsFeb-14-2026, 00:27:12 GMT

In this paper we propose a generalization of this work that generally exhibits improved performace, but from an implementation point of view is actually simpler.

artificial intelligence, machine learning, neural network, (18 more...)

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Neural Information Processing SystemsOct-9-2025, 21:58:56 GMT

2cc8dc30e52798b27d37b795cc153310-Paper-Conference.pdf

fourier component, fourier feature, fourier space, (14 more...)

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)

Neural Information Processing SystemsAug-14-2025, 19:38:33 GMT

50729453d56ecf6a8b7be78998776472-Paper-Conference.pdf

particle, reconstruction, representation, (15 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.67)

Industry:

Information Technology (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

arXiv.org Artificial IntelligenceJun-5-2024

Pre-trained Large Language Models Use Fourier Features to Compute Addition

Zhou, Tianyi, Fu, Deqing, Sharan, Vatsal, Jia, Robin

Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities, yet how they compute basic arithmetic, such as addition, remains unclear. This paper shows that pre-trained LLMs add numbers using Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain. Within the model, MLP and attention layers use Fourier features in complementary ways: MLP layers primarily approximate the magnitude of the answer using low-frequency features, while attention layers primarily perform modular addition (e.g., computing whether the answer is even or odd) using high-frequency features. Pre-training is crucial for this mechanism: models trained from scratch to add numbers only exploit low-frequency features, leading to lower accuracy. Introducing pre-trained token embeddings to a randomly initialized model rescues its performance. Overall, our analysis demonstrates that appropriate pre-trained representations (e.g., Fourier features) can unlock the ability of Transformers to learn precise mechanisms for algorithmic tasks.

fourier component, fourier feature, fourier space, (13 more...)

2406.03445

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

arXiv.org Artificial IntelligenceMay-27-2024

Mechanistic Interpretability of Binary and Ternary Transformers

Li, Jason

Recent research (arXiv:2310.11453, arXiv:2402.17764) has proposed binary and ternary transformer networks as a way to significantly reduce memory and improve inference speed in Large Language Models (LLMs) while maintaining accuracy. In this work, we apply techniques from mechanistic interpretability to investigate whether such networks learn distinctly different or similar algorithms when compared to full-precision transformer networks. In particular, we reverse engineer the algorithms learned for the toy problem of modular addition where we find that binary and ternary networks learn similar algorithms as full precision networks. This provides evidence against the possibility of using binary and ternary networks as a more interpretable alternative in the LLM setting.

algorithm, mechanistic interpretability, transformer, (13 more...)

2405.17703

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

arXiv.org Artificial IntelligenceOct-19-2023

Progress measures for grokking via mechanistic interpretability

Nanda, Neel, Chan, Lawrence, Lieberum, Tom, Smith, Jess, Steinhardt, Jacob

Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding emergence is to find continuous \textit{progress measures} that underlie the seemingly discontinuous qualitative changes. We argue that progress measures can be found via mechanistic interpretability: reverse-engineering learned behaviors into their individual components. As a case study, we investigate the recently-discovered phenomenon of ``grokking'' exhibited by small transformers trained on modular addition tasks. We fully reverse engineer the algorithm learned by these networks, which uses discrete Fourier transforms and trigonometric identities to convert addition to rotation about a circle. We confirm the algorithm by analyzing the activations and weights and by performing ablations in Fourier space. Based on this understanding, we define progress measures that allow us to study the dynamics of training and split training into three continuous phases: memorization, circuit formation, and cleanup. Our results show that grokking, rather than being a sudden shift, arises from the gradual amplification of structured mechanisms encoded in the weights, followed by the later removal of memorizing components.

conference paper, frequency, key frequency, (15 more...)

2301.05217

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Hashimoto, Yuka, Ikeda, Masahiro, Kadri, Hachem

Learning in RKHM: a $C^*$-Algebraic Twist for Kernel Machines

arXiv.org Artificial IntelligenceNov-12-2022

Supervised learning in reproducing kernel Hilbert space (RKHS) has been actively investigated since the early 1990s (Murphy, 2012; Christmann & Steinwart, 2008; Shawe-Taylor & Cristianini, 2004; Schölkopf & Smola, 2002; Boser et al., 1992). The notion of reproducing kernels as dot products in Hilbert spaces was first brought to the field of machine learning by Aizerman et al. (1964), while the theoretical foundation of reproducing kernels and their Hilbert spaces dates back to at least Aronszajn (1950). By virtue of the representer theorem (Schölkopf et al., 2001), we can compute the solution of an infinite-dimensional minimization problem in RKHS with given finite samples. In addition to the standard RKHSs, applying vector-valued RKHSs (vvRKHSs) to supervised learning has also been proposed and used in analyzing vector-valued data (Micchelli & Pontil, 2005; Álvarez et al., 2012; Kadri et al., 2016; Minh et al., 2016; Brouard et al., 2016; Laforgue et al., 2020; Huusari & Kadri, 2021). Generalization bounds of the supervised problems in RKHS and vvRKHS are also derived (Mohri et al., 2018; Caponnetto & De Vito, 2007; Audiffren & Kadri, 2013; Huusari & Kadri, 2021).

artificial intelligence, kernel, machine learning, (16 more...)

2210.11855

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Rey, Luis A. Pérez, Menkovski, Vlado, Portegies, Jacobus W.

Diffusion Variational Autoencoders

arXiv.org Machine LearningJan-25-2019

A standard Variational Autoencoder, with a Euclidean latent space, is structurally incapable of capturing topological properties of certain datasets. To remove topological obstructions, we introduce Diffusion Variational Autoencoders with arbitrary manifolds as a latent space. A Diffusion Variational Autoencoder uses transition kernels of Brownian motion on the manifold. In particular, it uses properties of the Brownian motion to implement the reparametrization trick and fast approximations to the KL divergence. We show that the Diffusion Variational Autoencoder is capable of capturing topological properties of synthetic datasets. Additionally, we train MNIST on spheres, tori, projective spaces, SO(3), and a torus embedded in R3. Although a natural dataset like MNIST does not have latent variables with a clear-cut topological structure, training it on a manifold can still highlight topological and geometrical properties.

latent space, manifold, variational autoencoder, (15 more...)

arXiv.org Machine Learning

1901.08991

Country: Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Samareh, Aven, Parizi, Mahshid Salemi

Stability of the Stochastic Gradient Method for an Approximated Large Scale Kernel Machine

arXiv.org Machine LearningApr-21-2018

In this paper we measured the stability of stochastic gradient method (SGM) for learning an approximated Fourier primal support vector machine. The stability of an algorithm is considered by measuring the generalization error in terms of the absolute difference between the test and the training error. Our problem is to learn an approximated kernel function using random Fourier features for a binary classification problem via online convex optimization settings. For a convex, Lipschitz continuous and smooth loss function, given reasonable number of iterations stochastic gradient method is stable. We showed that with a high probability SGM generalizes well for an approximated kernel under given assumptions.We empirically verified the theoretical findings for different parameters using several data sets.

artificial intelligence, generalization error, machine learning, (12 more...)

arXiv.org Machine Learning

1804.08003

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.71)